AITopics | reinforcement learning benchmark

Collaborating Authors

reinforcement learning benchmark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization

Neural Information Processing SystemsDec-24-2025, 00:28:18 GMT

name change, reinforcement learning benchmark, rl-vigen, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control

Maus, Tom, Atamna, Asma, Glasmachers, Tobias

arXiv.org Artificial IntelligenceOct-24-2025

Autonomous control of multi-stage industrial processes requires both local specialization and global coordination. Reinforcement learning (RL) offers a promising approach, but its industrial adoption remains limited due to challenges such as reward design, modularity, and action space management. Many academic benchmarks differ markedly from industrial control problems, limiting their transferability to real-world applications. This study introduces an enhanced industry-inspired benchmark environment that combines tasks from two existing benchmarks, SortingEnv and ContainerGym, into a sequential recycling scenario with sorting and pressing operations. We evaluate two control strategies: a modular architecture with specialized agents and a monolithic agent governing the full system, while also analyzing the impact of action masking. Our experiments show that without action masking, agents struggle to learn effective policies, with the modular architecture performing better. When action masking is applied, both architectures improve substantially, and the performance gap narrows considerably. These results highlight the decisive role of action space constraints and suggest that the advantages of specialization diminish as action complexity is reduced. The proposed benchmark thus provides a valuable testbed for exploring practical and robust multi-agent RL solutions in industrial automation, while contributing to the ongoing debate on centralization versus specialization.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2510.20408

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (1.00)

Industry: Electrical Industrial Apparatus (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

EpiCare: A Reinforcement Learning Benchmark for Dynamic Treatment Regimes

Neural Information Processing SystemsMay-27-2025, 20:33:20 GMT

Healthcare applications pose significant challenges to existing reinforcement learning (RL) methods due to implementation risks, low data availability, short treatment episodes, sparse rewards, partial observations, and heterogeneous treatment effects. Despite significant interest in using RL to generate dynamic treatment regimes for longitudinal patient care scenarios, no standardized benchmark has yet been developed.To fill this need we introduce Episodes of Care (EpiCare), a benchmark designed to mimic the challenges associated with applying RL to longitudinal healthcare settings. We leverage this benchmark to test five state-of-the-art offline RL models as well as five common off-policy evaluation (OPE) techniques.Our results suggest that while offline RL may be capable of improving upon existing standards of care given large data availability, its applicability does not appear to extend to the moderate to low data regimes typical of healthcare settings. Additionally, we demonstrate that several OPE techniques which have become standard in the the medical RL literature fail to perform adequately on our benchmark. These results suggest that the performance of RL models in dynamic treatment regimes may be difficult to meaningfully evaluate using current OPE methods, indicating that RL for this application may still be in its early stages.

benchmark, dynamic treatment regime, reinforcement learning benchmark, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.85)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization

Neural Information Processing SystemsOct-9-2024, 23:29:33 GMT

Visual Reinforcement Learning (Visual RL), coupled with high-dimensional observations, has consistently confronted the long-standing challenge of out-of-distribution generalization. Despite the focus on algorithms aimed at resolving visual generalization problems, we argue that the devil is in the existing benchmarks as they are restricted to isolated tasks and generalization categories, undermining a comprehensive evaluation of agents' visual generalization capabilities. To bridge this gap, we introduce RL-ViGen: a novel Reinforcement Learning Benchmark for Visual Generalization, which contains diverse tasks and a wide spectrum of generalization types, thereby facilitating the derivation of more reliable conclusions. Furthermore, RL-ViGen incorporates the latest generalization visual RL algorithms into a unified framework, under which the experiment results indicate that no single existing algorithm has prevailed universally across tasks. Our aspiration is that Rl-ViGen will serve as a catalyst in this area, and lay a foundation for the future creation of universal visual generalization RL agents suitable for real-world scenarios.

reinforcement learning benchmark, rl-vigen, visual generalization, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning Experiments and Benchmark for Solving Robotic Reaching Tasks

Aumjaud, Pierre, McAuliffe, David, Lera, Francisco Javier Rodríguez, Cardiff, Philip

arXiv.org Artificial IntelligenceNov-11-2020

Reinforcement learning has shown great promise in robotics thanks to its ability to develop efficient robotic control procedures through self-training. In particular, reinforcement learning has been successfully applied to solving the reaching task with robotic arms. In this paper, we define a robust, reproducible and systematic experimental procedure to compare the performance of various model-free algorithms at solving this task. The policies are trained in simulation and are then transferred to a physical robotic manipulator. It is shown that augmenting the reward signal with the Hindsight Experience Replay exploration technique increases the average return of off-policy agents between 7 and 9 folds when the target position is initialised randomly at the beginning of each episode.

algorithm, international conference, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-62579-5_22

2011.05782

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Spain > Castile and León > León Province > León (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback